Search results for " Compressed Data Structures"

showing 2 items of 2 documents

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

2021

The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as $r$) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the $r$ parameter as measure of repetitiveness, especially to evalua…

FOS: Computer and information sciencesBurrows–Wheeler transformSettore INF/01 - InformaticaCombinatorics on wordsFormal Languages and Automata Theory (cs.FL)Computer scienceString (computer science)Search engine indexingCompressed data structuresComputer Science - Formal Languages and Automata TheoryString indexingData structureMeasure (mathematics)Burrows-Wheeler-TransformRepetitivenessCombinatorics on wordsBurrows-Wheeler-Transform Compressed data structures String indexing Repetitiveness Combinatorics on wordsTransformation (function)Computer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)AlgorithmData compression
researchProduct

Alignment-free Genomic Analysis via a Big Data Spark Platform

2021

Abstract Motivation Alignment-free distance and similarity functions (AF functions, for short) are a well-established alternative to pairwise and multiple sequence alignments for many genomic, metagenomic and epigenomic tasks. Due to data-intensive applications, the computation of AF functions is a Big Data problem, with the recent literature indicating that the development of fast and scalable algorithms computing AF functions is a high-priority task. Somewhat surprisingly, despite the increasing popularity of Big Data technologies in computational biology, the development of a Big Data platform for those tasks has not been pursued, possibly due to its complexity. Results We fill this impo…

FOS: Computer and information sciencesStatistics and Probabilitysequence analysisComputer science0206 medical engineeringBig data02 engineering and technologyMachine learningcomputer.software_genreBiochemistry03 medical and health sciencesSpark (mathematics)MapReduceMolecular Biology030304 developmental biology0303 health sciencesSettore INF/01 - Informaticabusiness.industryBioinformatics High Performance Computing Compressed Data StructuresMapReduce; hadoop; sequence analysisComputer Science ApplicationsComputational MathematicsTask (computing)Computer Science - Distributed Parallel and Cluster ComputingComputational Theory and MathematicsDistributed Parallel and Cluster Computing (cs.DC)Artificial intelligencehadoopbusinesscomputer020602 bioinformaticsBioinformatics
researchProduct